Tracking unbounded Topic Streams

نویسندگان

  • Dominik Wurzer
  • Victor Lavrenko
  • Miles Osborne
چکیده

Tracking topics on social media streams is non-trivial as the number of topics mentioned grows without bound. This complexity is compounded when we want to track such topics against other fast moving streams. We go beyond traditional small scale topic tracking and consider a stream of topics against another document stream. We introduce two tracking approaches which are fully applicable to true streaming environments. When tracking 4.4 million topics against 52 million documents in constant time and space, we demonstrate that counter to expectations, simple single-pass clustering can outperform locality sensitive hashing for nearest neighbour search on streams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Stor-e-Motion Visualization for Topic Evolution Tracking in Social Media Streams

Nowadays, there are plenty of sources generating massive amounts of text streams in a continuous way. For example, the increasing popularity and the active use of social networks results in voluminous and fast-flowing data streams containing a large amount of user-generated data about almost any topic around the world. However, the observation and tracking of the ongoing evolution of topics in ...

متن کامل

Topic Detection & Tracking (TDT) Overview & Perspective

Topic Detection and Tracking (TDT) refers to automatic techniques for finding topically related material in streams of data (e.g., newswire and broadcast news). Work on TDT began about a year ago, is now expanding, and will be a regular feature at future Broadcast News workshops.

متن کامل

The Stor-e-Motion Visualization for Topic Evolution Tracking in Text Data Streams

Nowadays, there are plenty of sources generating massive amounts of text data streams in a continuous way. For example, the increasing popularity and the active use of social networks result in voluminous and fastflowing text data streams containing a large amount of user-generated data about almost any topic around the world. However, the observation and tracking of the ongoing evolution of to...

متن کامل

Towards Online Concept Drift Detection with Feature Selection for Data Stream Classification

Data Streams are unbounded, sequential data instances that are generated very rapidly. The storage, querying and mining of such rapid flows of data is computationally very challenging. Data Stream Mining (DSM) is concerned with the mining of such data streams in real-time using techniques that require only one pass through the data. DSM techniques need to be adaptive to reflect changes of the p...

متن کامل

Emerging User Intentions: Matching User Queries with Topic Evolution in News Text Streams

Topic and event evolution analysis aiming at trend detection and tracking (TDT) from news data streams has considerably gained in interest during the last years. Consolidated studies have concentrated on identifying and visualizing dynamically evolving text patterns from news data streams. Detecting and understanding user behavior and relating user intentions to emerging topic trends in news da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015